54 research outputs found

    ACTS in Need: Automatic Configuration Tuning with Scalability Guarantees

    Full text link
    To support the variety of Big Data use cases, many Big Data related systems expose a large number of user-specifiable configuration parameters. Highlighted in our experiments, a MySQL deployment with well-tuned configuration parameters achieves a peak throughput as 12 times much as one with the default setting. However, finding the best setting for the tens or hundreds of configuration parameters is mission impossible for ordinary users. Worse still, many Big Data applications require the support of multiple systems co-deployed in the same cluster. As these co-deployed systems can interact to affect the overall performance, they must be tuned together. Automatic configuration tuning with scalability guarantees (ACTS) is in need to help system users. Solutions to ACTS must scale to various systems, workloads, deployments, parameters and resource limits. Proposing and implementing an ACTS solution, we demonstrate that ACTS can benefit users not only in improving system performance and resource utilization, but also in saving costs and enabling fairer benchmarking

    BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning

    Full text link
    An ever increasing number of configuration parameters are provided to system users. But many users have used one configuration setting across different workloads, leaving untapped the performance potential of systems. A good configuration setting can greatly improve the performance of a deployed system under certain workloads. But with tens or hundreds of parameters, it becomes a highly costly task to decide which configuration setting leads to the best performance. While such task requires the strong expertise in both the system and the application, users commonly lack such expertise. To help users tap the performance potential of systems, we present BestConfig, a system for automatically finding a best configuration setting within a resource limit for a deployed system under a given application workload. BestConfig is designed with an extensible architecture to automate the configuration tuning for general systems. To tune system configurations within a resource limit, we propose the divide-and-diverge sampling method and the recursive bound-and-search algorithm. BestConfig can improve the throughput of Tomcat by 75%, that of Cassandra by 63%, that of MySQL by 430%, and reduce the running time of Hive join job by about 50% and that of Spark join job by about 80%, solely by configuration adjustment

    A sparse Bayesian learning method for structural equation model-based gene regulatory network inference

    Get PDF
    Gene regulatory networks (GRNs) are underlying networks identified by interactive relationships between genes. Reconstructing GRNs from massive genetic data is important for understanding gene functions and biological mechanism, and can provide effective service for medical treatment and genetic research. A series of artificial intelligence based methods have been proposed to infer GRNs from both gene expression data and genetic perturbations. The accuracy of such algorithms can be better than those models that just consider gene expression data. A structural equation model (SEM), which provides a systematic framework integrating both types of gene data conveniently, is a commonly used model for GRN inference. Considering the sparsity of GRNs, in this paper, we develop a novel sparse Bayesian inference algorithm based on Normal-Equation-Gamma (NEG) type hierarchical prior (BaNEG) to infer GRNs modeled with SEMs more accurately. First, we reparameterize an SEM as a linear type model by integrating the endogenous and exogenous variables; Then, a Bayesian adaptive lasso with a three-level NEG prior is applied to deduce the corresponding posterior mode and estimate the parameters. Simulations on synthetic data are run to compare the performance of BaNEG to some state-of-the-art algorithms, the results demonstrate that the proposed algorithm visibly outperforms the others. What’s more, BaNEG is applied to infer underlying GRNs from a real data set composed of 47 yeast genes from Saccharomyces cerevisiae to discover potential relationships between genes

    The Feasibility Study of Megavoltage Computed Tomographic (MVCT) Image for Texture Feature Analysis

    Get PDF
    Purpose: To determine whether radiomics texture features can be reproducibly obtained from megavoltage computed tomographic (MVCT) images acquired by Helical TomoTherapy (HT) with different imaging conditions.Methods: For each of the 195 textures enrolled, the mean intrapatient difference, which is considered to be the benchmark for reproducibility, was calculated from the MVCT images of 22 patients with early-stage non-small-cell lung cancer. Test–retest MVCT images of an in-house designed phantom were acquired to determine the concordance correlation coefficient (CCC) for these 195 texture features. Features with high reproducibility (CCC > 0.9) in the phantom test–retest set were investigated for sensitivities to different imaging protocols, scatter levels, and motion frequencies using a wood phantom and in-vitro animal tissues.Results: Of the 195 features, 165 (85%) features had CCC > 0.9. For the wood phantom, 124 features were reproducible in two kinds of scatter materials, and further investigations were performed on these features. For animal tissues, 108 features passed the criteria for reproducibility when one layer of scatter was covered, while 106 and 108 features of in-vitro liver and bone passed with two layers of scatter, respectively. Considering the effect of differing acquisition pitch (AcP), 97 features extracted from wood passed, while 103 and 59 features extracted from in-vitro liver and bone passed, respectively. Different reconstruction intervals (RI) had a small effect on the stability of the feature value. When AcP and RI were held consistent without motion, all 124 features calculated from wood passed, and a majority (122 of 124) of the features passed when imaging with a “fine” AcP with different RIs. However, only 55 and 40 features passed with motion frequencies of 20 and 25 beats per minute, respectively.Conclusion: Motion frequency has a significant impact on MVCT texture features, and features from MVCT were more reproducibility in different scatter conditions than those from CBCT. Considering the effects of AcP and RI, the scanning protocols should be kept consistent when MVCT images are used for feature analysis. Some radiomics features from HT MVCT images are reproducible and could be used for creating clinical prediction models in the future

    Vomiting and wasting disease associated with hemagglutinating encephalomyelitis viruses infection in piglets in jilin, china

    Get PDF
    One coronavirus strain was isolated from brain tissues of ten piglets with evident clinical manifestations of vomiting, diarrhea and dyskinesia in Jilin province in China. Antigenic and genomic characterizations of the virus (isolate PHEV-JLsp09) were based on multiplex PCR and negative staining electron microscopy and sequence analysis of the Hemagglutinin-esterase (HE) gene. These piglets were diagnosed with Porcine hemagglutinating encephalomyelitis virus (PHEV)

    High Altitude test of RPCs for the ARGO-YBJ experiment

    Get PDF
    A 50 m**2 RPC carpet was operated at the YangBaJing Cosmic Ray Laboratory (Tibet) located 4300 m a.s.l. The performance of RPCs in detecting Extensive Air Showers was studied. Efficiency and time resolution measurements at the pressure and temperature conditions typical of high mountain laboratories, are reported.Comment: 16 pages, 10 figures, submitted to Nucl. Instr. Met

    Sciences of the USA 1418 -1421 ͉ PNAS

    Get PDF
    The discovery of the block-like structure of linkage disequilibrium (LD) in human populations holds the promise of delineating the etiology of common diseases. However, understanding the magnitude, mechanism, and utility of between-population LD sharing is critical for future genome-wide association studies. In this study, substantial LD sharing between six non-African populations was observed, although much less between African-American and non-African, based on 20,000 SNPs of chromosome 21. We also demonstrated the respective roles of recombination and demographic events in shaping LD sharing. Furthermore, we showed that the haplotype-tagged SNPs chosen from one population are portable to the others in East Asia. Therefore, we concluded that the magnitude of LD sharing between human populations justifies the use of representative populations for selecting haplotypetagged SNPs in genome-wide association studies of complex diseases. bottleneck ͉ genetic distance ͉ association study ͉ common disease ͉ genetic variant C omprehensive testing of the association between genetic variations in the human genome and common diseases holds the promise of delineating the genetic architecture of these diseases (1-5). Substantial sharing of the boundaries and specific haplotypes of linkage disequilibrium (LD) blocks between populations was observed (6). However, variations of haplotype and LD across populations were also reported, raising concerns on its practical hindrance for genomewide testing of association (7-9). Conflicting observations on the magnitude of LD sharing between human populations, therefore, call for a careful examination of the following three questions, which are fundamental in developing strategies for genomewide testing of association. First, measurement of LD sharing between populations should be made independent of the definition of LD blocks, which introduce inconsistent block boundaries (10). Second, the mechanisms that shape LD sharing between populations are yet to be fully explored although the roles of recombination hotspots and demographic events have been implicated To address the aforementioned questions, we typed Ͼ20,000 SNPs on chromosome 21 in seven populations: three representative continental populations [African-American (AFR), European (EUR), and Han Chinese (HAN)] and four other major East Asian (EA) populations. This design allows a close examination of LD sharing between continental groups as well as those within East Asia. In this report, we measured the LD sharing between populations independent of the definition of LD block; and we showed that bottleneck events play a critical role in shaping the LD sharing between Africans and nonAfricans, but much less so between non-Africans. An important question for applying HapMap results to disease studies is how tagSNPs selected from a HapMap population will be ported to disease studies performed in other populations. In this study, we showed that tagSNPs selected from representative continental populations are indeed portable to the others in the same continent for association studies, at least in East Asia, with reasonable efficiency. In addition, we proposed a simple guideline that allows a quick evaluation of the portability of tagSNPs between populations by typing a small number of SNPs. Results Overall 26,112 SNPs were selected and typed in this study, and the data from 19,060 SNPs passed the quality control criteria and were used for further analyses. The SNPs and quality control criteria for SNP selection are described in Materials and Methods. Seven world populations, including EUR, AFR, and five EA populations, were studied. The five EA populations, i.e., HAN, Miao (HMJ), Zhuang (CCY), Wa (WBM), and Uighur (UIG), represent five major linguistic families spoken in East Asia. Preservation of LD between populations, i.e., LD sharing (S, or S AB when the population A was given as reference), is measured by the proportion of SNP pairs in LD in one population (population A or the reference) that are also in LD in another (population B). In this study, LD sharing was estimated without invoking the inference of haplotype blocks; therefore, the measure is independent of the definition of haplotype blocks. LD between two loci was measured in r 2 (16). Detail for the measure of LD sharing is described in Materials and Methods. LD sharing between EAs ranges from 63-74% for r 2 Ն 0.1 and 70-84% for r 2 Ն 0.5 (se
    • …
    corecore